Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance
نویسندگان
چکیده
This paper proposes a new training method of deep neural networks (DNNs) for statistical voice conversion. DNNs are now being used as conversion models that represent mapping from source features to target features in statistical voice conversion. However, there are two major problems to be solved in conventional DNN-based voice conversion: 1) the inconsistency between the training and synthesis criteria, and 2) the oversmoothing of the generated parameter trajectories. In this paper, we introduce a parameter trajectory generation process considering the global variance (GV) into the training of DNNs for voice conversion. A consistent framework using the same criterion for both training and synthesis provides better conversion accuracy in the original static feature domain, and the over-smoothing can be avoided by optimizing the DNN parameters on the basis of the trajectory likelihood considering the GV. Experimental results show that the proposed method outperforms the DNN-based method in term of both speech quality and speaker similarity.
منابع مشابه
Estimating and modeling monthly mean daily global solar radiation on horizontal surfaces using artificial neural networks
In this study, an artificial neural network based model for prediction of solar energy potential in Kerman province in Iran has been developed. Meteorological data of 12 cities for period of 17 years (1997–2013) and solar radiation for five cities around and inside Kerman province from the Iranian Meteorological Office data center were used for the training and testing the network. Meteorologic...
متن کاملStatistical singing voice conversion based on direct waveform modification with global variance
This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the co...
متن کاملParallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
We propose a parallel-data-free voice conversion (VC)method that can learn a mapping from source to target speech without relying on parallel data. The proposed method is generalpurpose, high quality, and parallel-data-free, which works without any extra data, modules, or alignment procedure. It is also noteworthy that it avoids over-smoothing, which occurs in many conventional statistical mode...
متن کاملImplementation of Computationally Efficient Real-Time Voice Conversion
This paper presents an implementation of real-time processing of statistical voice conversion (VC) based on Gaussian mixture models (GMMs). To develop VC applications for enhancing our human-to-human speech communication, it is essential to implement real-time conversion processing. Moreover, it is useful to reduce computational complexity of the conversion processing for making VC applications...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016